Census Division No. 7
SportQA: A Benchmark for Sports Understanding in Large Language Models
Xia, Haotian, Yang, Zhengbang, Wang, Yuqing, Tracy, Rhys, Zhao, Yun, Huang, Dongdong, Chen, Zezhi, Zhu, Yan, Wang, Yuan-fang, Shen, Weining
A deep understanding of sports, a field rich in strategic and dynamic content, is crucial for advancing Natural Language Processing (NLP). This holds particular significance in the context of evaluating and advancing Large Language Models (LLMs), given the existing gap in specialized benchmarks. To bridge this gap, we introduce SportQA, a novel benchmark specifically designed for evaluating LLMs in the context of sports understanding. SportQA encompasses over 70,000 multiple-choice questions across three distinct difficulty levels, each targeting different aspects of sports knowledge from basic historical facts to intricate, scenario-based reasoning tasks. We conducted a thorough evaluation of prevalent LLMs, mainly utilizing few-shot learning paradigms supplemented by chain-of-thought (CoT) prompting. Our results reveal that while LLMs exhibit competent performance in basic sports knowledge, they struggle with more complex, scenario-based sports reasoning, lagging behind human expertise. The introduction of SportQA marks a significant step forward in NLP, offering a tool for assessing and enhancing sports understanding in LLMs.
Navigating the Evaluation Funnel to Optimize Iteration Speed for Recommender Systems
Schultzberg, Claire, Ottens, Brammert
Over the last decades has emerged a rich literature on the evaluation of recommendation systems. However, less is written about how to efficiently combine different evaluation methods from this rich field into a single efficient evaluation funnel. In this paper we aim to build intuition for how to choose evaluation methods, by presenting a novel framework that simplifies the reasoning around the evaluation funnel for a recommendation system. Our contribution is twofold. First we present our framework for how to decompose the definition of success to construct efficient evaluation funnels, focusing on how to identify and discard non-successful iterations quickly. We show that decomposing the definition of success into smaller necessary criteria for success enables early identification of non-successful ideas. Second, we give an overview of the most common and useful evaluation methods, discuss their pros and cons, and how they fit into, and complement each other in, the evaluation process. We go through so-called offline and online evaluation methods such as counterfactual logging, validation, verification, A/B testing, and interleaving. The paper concludes with some general discussion and advice on how to design an efficient evaluation process for recommender systems.
A Multi-Fidelity Methodology for Reduced Order Models with High-Dimensional Inputs
Mufti, Bilal, Perron, Christian, Mavris, Dimitri N.
In the early stages of aerospace design, reduced order models (ROMs) are crucial for minimizing computational costs associated with using physics-rich field information in many-query scenarios requiring multiple evaluations. The intricacy of aerospace design demands the use of high-dimensional design spaces to capture detailed features and design variability accurately. However, these spaces introduce significant challenges, including the curse of dimensionality, which stems from both high-dimensional inputs and outputs necessitating substantial training data and computational effort. To address these complexities, this study introduces a novel multi-fidelity, parametric, and non-intrusive ROM framework designed for high-dimensional contexts. It integrates machine learning techniques for manifold alignment and dimension reduction employing Proper Orthogonal Decomposition (POD) and Model-based Active Subspace with multi-fidelity regression for ROM construction. Our approach is validated through two test cases: the 2D RAE~2822 airfoil and the 3D NASA CRM wing, assessing combinations of various fidelity levels, training data ratios, and sample sizes. Compared to the single-fidelity PCAS method, our multi-fidelity solution offers improved cost-accuracy benefits and achieves better predictive accuracy with reduced computational demands. Moreover, our methodology outperforms the manifold-aligned ROM (MA-ROM) method by 50% in handling scenarios with large input dimensions, underscoring its efficacy in addressing the complex challenges of aerospace design.
Large Generative AI Models for Telecom: The Next Big Thing?
Bariah, Lina, Zhao, Qiyang, Zou, Hang, Tian, Yu, Bader, Faouzi, Debbah, Merouane
The evolution of generative artificial intelligence (GenAI) constitutes a turning point in reshaping the future of technology in different aspects. Wireless networks in particular, with the blooming of self-evolving networks, represent a rich field for exploiting GenAI and reaping several benefits that can fundamentally change the way how wireless networks are designed and operated nowadays. To be specific, large GenAI models are envisioned to open up a new era of autonomous wireless networks, in which multi-modal GenAI models trained over various Telecom data, can be fine-tuned to perform several downstream tasks, eliminating the need for building and training dedicated AI models for each specific task and paving the way for the realization of artificial general intelligence (AGI)-empowered wireless networks. In this article, we aim to unfold the opportunities that can be reaped from integrating large GenAI models into the Telecom domain. In particular, we first highlight the applications of large GenAI models in future wireless networks, defining potential use-cases and revealing insights on the associated theoretical and practical challenges. Furthermore, we unveil how 6G can open up new opportunities through connecting multiple on-device large GenAI models, and hence, paves the way to the collective intelligence paradigm. Finally, we put a forward-looking vision on how large GenAI models will be the key to realize self-evolving networks.
How Much Is Hidden in the NAS Benchmarks? Few-Shot Adaptation of a NAS Predictor
Loya, Hrushikesh, Dudziak, ลukasz, Mehrotra, Abhinav, Lee, Royson, Fernandez-Marques, Javier, Lane, Nicholas D., Wen, Hongkai
Neural architecture search has proven to be a powerful approach to designing and refining neural networks, often boosting their performance and efficiency over manually-designed variations, but comes with computational overhead. While there has been a considerable amount of research focused on lowering the cost of NAS for mainstream tasks, such as image classification, a lot of those improvements stem from the fact that those tasks are well-studied in the broader context. Consequently, applicability of NAS to emerging and under-represented domains is still associated with a relatively high cost and/or uncertainty about the achievable gains. To address this issue, we turn our focus towards the recent growth of publicly available NAS benchmarks in an attempt to extract general NAS knowledge, transferable across different tasks and search spaces. We borrow from the rich field of meta-learning for few-shot adaptation and carefully study applicability of those methods to NAS, with a special focus on the relationship between task-level correlation (domain shift) and predictor transferability; which we deem critical for improving NAS on diverse tasks. In our experiments, we use 6 NAS benchmarks in conjunction, spanning in total 16 NAS settings -- our meta-learning approach not only shows superior (or matching) performance in the cross-validation experiments but also successful extrapolation to a new search space and tasks.
Interpretable Machine Learning for Discovery: Statistical Challenges \& Opportunities
Allen, Genevera I., Gan, Luqin, Zheng, Lili
Machine learning systems have gained widespread use in science, technology, and society. Given the increasing number of high-stakes machine learning applications and the growing complexity of machine learning models, many have advocated for interpretability and explainability to promote understanding and trust in machine learning results (Rasheed et al., 2022, Toreini et al., 2020, Broderick et al., 2023). In response, there has been a recent explosion of research on Interpretable Machine Learning (IML), mostly focusing on new techniques to interpret black-box systems; see Molnar (2022), Lipton (2018), Guidotti et al. (2018), Doshi-Velez & Kim (2017), Du et al. (2019), Murdoch et al. (2019), Carvalho et al. (2019) for recent reviews of the IML and explainable artificial intelligence literature. While most of these interpretability techniques were not necessarily designed for this purpose, they are increasingly being used to mine large and complex data sets to generate new insights (Roscher et al., 2020). These so-called data-driven discoveries are especially important to advance data-rich fields in science, technology, and medicine. While prior reviews focus mainly on IML techniques, we primarily review how IML methods promote data-driven discoveries, challenges associated with this task, and related new research opportunities at the intersection of machine learning and statistics. In the sciences and beyond, IML techniques are routinely employed to make new discoveries from large and complex data sets; to motivate our review on this topic, we highlight several examples. First, feature importance and feature selection in supervised learning are popular forms of interpretation that have led to major discoveries like discovering new genomic biomarkers of diseases (Guyon et al., 2002), discovering physical laws governing dynamical systems (Brunton et al., 2016), and discovering lesions and other abnormalities in radiology (Borjali et al., 2020, Reyes et al., 2020). While most of the IML literature focuses on supervised learning (Molnar, 2022, Lipton, 2018, Guidotti et al., 2018, Doshi-Velez & Kim, 2017), there have been many major scientific discoveries made via unsupervised techniques and we argue that these approaches
Reduce, Reuse, Recycle: Selective Reincarnation in Multi-Agent Reinforcement Learning
Formanek, Claude, Tilbury, Callum Rhys, Shock, Jonathan, Tessera, Kale-ab, Pretorius, Arnu
'Reincarnation' in reinforcement learning has been proposed as a formalisation of reusing prior computation from past experiments when training an agent in an environment. In this paper, we present a brief foray into the paradigm of reincarnation in the multi-agent (MA) context. We consider the case where only some agents are reincarnated, whereas the others are trained from scratch -- selective reincarnation. In the fully-cooperative MA setting with heterogeneous agents, we demonstrate that selective reincarnation can lead to higher returns than training fully from scratch, and faster convergence than training with full reincarnation. However, the choice of which agents to reincarnate in a heterogeneous system is vitally important to the outcome of the training -- in fact, a poor choice can lead to considerably worse results than the alternatives. We argue that a rich field of work exists here, and we hope that our effort catalyses further energy in bringing the topic of reincarnation to the multi-agent realm.
RFC-Net: Learning High Resolution Global Features for Medical Image Segmentation on a Computational Budget
Saha, Sourajit, Saha, Shaswati, Gani, Md Osman, Oates, Tim, Chapman, David
Learning High-Resolution representations is essential for semantic segmentation. Convolutional neural network (CNN)architectures with downstream and upstream propagation flow are popular for segmentation in medical diagnosis. However, due to performing spatial downsampling and upsampling in multiple stages, information loss is inexorable. On the contrary, connecting layers densely on high spatial resolution is computationally expensive. In this work, we devise a Loose Dense Connection Strategy to connect neurons in subsequent layers with reduced parameters. On top of that, using a m-way Tree structure for feature propagation we propose Receptive Field Chain Network (RFC-Net) that learns high resolution global features on a compressed computational space. Our experiments demonstrates that RFC-Net achieves state-of-the-art performance on Kvasir and CVC-ClinicDB benchmarks for Polyp segmentation.
How AI & Robotics are paving the path toward a new supply chain world
AI victories are finally not science fiction anymore. With the help of the surrounding technologies such as graphic cards, cloud, and simple languages, The integration of AI (Artificial Intelligence) and RPA (Robotic Process Automation) within the supply chain field is finally happening. In addition, governments and organizations have started to regulate the same. We might won't be able to dive in the technicality side of each. However, will explain the relationship between this triad and how they share the sustainable same impact in today's world.
Fully differentiable model discovery
Model discovery aims at autonomously discovering differential equations underlying a dataset. Approaches based on Physics Informed Neural Networks (PINNs) have shown great promise, but a fully-differentiable model which explicitly learns the equation has remained elusive. In this paper we propose such an approach by combining neural network based surrogates with Sparse Bayesian Learning (SBL). We start by reinterpreting PINNs as multitask models, applying multitask learning using uncertainty, and show that this leads to a natural framework for including Bayesian regression techniques. We then construct a robust model discovery algorithm by using SBL, which we showcase on various datasets. Concurrently, the multitask approach allows the use of probabilistic approximators, and we show a proof of concept using normalizing flows to directly learn a density model from single particle data. Our work expands PINNs to various types of neural network architectures, and connects neural network-based surrogates to the rich field of Bayesian parameter inference.